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Abstract 

This paper uses combinatorics and group theory to answer questions about 
the assembly of icosahedral viral shells. Although the geometric structure of the 
capsid (shell) is fairly well understood in terms of its constituent subunits, the 
assembly process is not. For the purpose of this paper, the capsid is modeled 
by a polyhedron whose facets represent the monomers. The assembly process is 
modeled by a rooted tree, the leaves representing the facets of the polyhedron, 
the root representing the assembled polyhedron, and the internal vertices repre- 
senting intermediate stages of assembly (subsets of facets). Besides its virological 
motivation, the enumeration of orbits of trees under the action of a finite group 
is of independent mathematical interest. If G is a finite group acting on a finite 
set A, then there is a natural induced action of G on the set Tx of trees whose 
leaves are bijectively labeled by the elements of A. If G acts simply on A, then 
|A| := |A„| = n ■ where n is the number of G-orbits in A. The basic com- 
binatorial results in this paper are (1) a formula for the number of orbits of each 
size in the action of G on Tx^, for every n, and (2) a simple algorithm to find 
the stabilizer of a tree r G Tx in G that runs in linear time and does not need 
memory in addition to its input tree. 
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Figure 1: (Left) Basic Viral Structure. (Right) Minute virus of Mice X-ray and monomer 
structure courtesy [1] 

1 Introduction 

Viral shells, called capsids, encapsulate and protect the fragile nucleic acid genome from 
physical, chemical, and enzymatic damage. Francis Crick and James Watson (1956) 
were the first to suggest that viral shells are composed of numerous identical protein 
subunits called monomers. For many viruses, these monomers are arranged in either 
a helical or an icosahedral structure. We are interested in those shells that possess 
icosahedral symmetry. 

Icosahedral viral shells can be classified based on their polyhedral structure, facets 
corresponding to the monomers. The classical "quasi-equivalence theory" of Caspar and 
Klug [6] explains the structure of the polyhedral shell in the case where the monomers 
have very similar neighborhoods. According to the theory, the number of facets in the 
polyhedron is GOT, where the T-number is of the form h? + hk + k'^. Here h and k are 
non-negative integers. The icosahedral group acts simply on the set of facets of the 
polyhedron (monomers of the shell). 

Although for many virus families the structure is fairly well understood and sub- 
stantiated by crystallographic images, the viral assembly process - just like many other 
spontaneous macromolecular assembly processes - is not well-understood, even for T = 1 
viral shells. In many cases, the capsid self-assembles spontaneously, rapidly and quite 
accurately in the host cell, with or without enclosing the internal genomic material, and 
without the use of chaperone, scaffolding or other helper proteins. This is the type of 
assembly that we consider here. See Figure [1] for basic icosahedral structure and X-ray 
structure of a T=l virus. 

Many mathematical models of viral shell assembly have been proposed and studied 
including [21 HJ [T21 [H [13 [ISl 123 123, [2Z1 ■ Here we use the GT ( geometry and tensegrity ) 
model of [18]. In the GT model, information about the construction (or decomposition) 
of the viral shell is represented by an assembly tree. The vertices of the tree represent 
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Figure 2: (Left) Face numbers: pentamer of trimers in a T = 1 polyhedron. (Right) 
Vertex numbers: trimers of pentamers in a T = 1 polyhedron. 

subassemblies that do not disintegrate during the course of the assembly process. In 
an assembly tree, these subassemblies are partially ordered by containment, with the 
root representing the complete assembled structure, and the leaves representing the 
monomers. That is, we only consider trees that represent successful assemblies. See 
Figure [2] for the nomenclature of a T = 1 polyhedron, and Figures [3], H] for examples 
of assembly trees. Besides being intuitive and analyzable, it was shown in [18] that the 
GT model's rough predictions fit experimental and biophysical observations of known 
T = 1 viral assemblies, specifically those of the viruses MVM (Minute Virus of Mice), 
MSV (Maize Streak Virus) and AAV4 (Human Adeno Associate Virus). 

The GT model was developed to answer questions that concern only the influence of 
two quantities on the probability of each type of assembly tree. We call these quantities 
the geometric stability factor and the symmetry factor. The higher these quantities, the 
higher the probability. 

The geometric stability factor is correlated with biochemical stability and influenced 
by assembly and disassembly energy thresholds and is defined using the effect of geomet- 
ric constraints within monomers or between monomers. These constraints are distances, 
angles and forces between the monomer residues. These can be obtained either from 
X-ray or from cryo-electro-microscopic information on the complete viral shell. More 
specifically, the final viral structure can be viewed formally as the solution to a system 
of geometric constraints that can be expressed as algebraic equations and inequalities. 
For each internal vertex of an assembly tree, namely a subassembly, the geometric sta- 
bility factor can be computed using quantifiable properties - such as extent of rigidity 
or algebraic complexity of the configuration space of the subassembly. 
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Figure 3: T = 1 valid assembly trees based on pentameric subassemblies and nomen- 
clature of Figure [21 triangles at bottom represent vertices with five leaves as children; 
arrows represent the action of the icosahedral group on trees ; only the long horizontal 
arrows in the two figures on right fix the corresponding trees. 




Figure 4: T = 1 valid assembly trees based on trimeric subassemblies, triangles at 
bottom represent trimers. 
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This factor is computed by analyzing the corresponding subsystems of the given 
viral geometric constraint system. It was argued in [18] that the rigidity aspect of the 
geometric stability factor can be generically expressed purely using graph theory. Some 
assembly trees can never occur (have probability zero) since the subassemblies occurring 
in them are unstable (their geometric stability factor is zero). Such assembly trees are 
geometrically invalid. 

The symmetry factor is defined as follows. The icosahedral group acts naturally on 
the set of assembly trees for a particular viral polyhedron P, whose facets (representing 
viral monomers) are the leaves of the tree as illustrated in Figure [31 Each orbit under 
this action is called an assembly pathway and corresponds intuitively to a distinct type 
of assembly process for the viral capsid. The symmetry factor is the number of assembly 
trees in the pathway divided by the total number of trees. We assume that each assembly 
tree is equally likely to occur. 

In [19], the authors observed the following attractive feature of the GT model of 
assembly. The two separate factors - geometric stability and symmetry - that influence 
the probability of the occurrence of a particular assembly pathway can be analyzed 
largely independently as follows. An obvious, but crucial, observation made in [19] is 
that both the geometric stability factor and geometric validity are invariants of the 
assembly pathway. That is, they remain the same for any assembly tree in the same 
orbit under the action of the icosahedral group. Thus the probability of the occurrence 
of a pathway is roughly proportional to some combination of the symmetric factor and 
the geometric stability factor. Additionally, the ratio of the orbit sizes of two trees 
Ti and T2 could serve as a rough estimate of the the ratio of the probabilities of the 
corresponding assembly pathways - provided that the former ratio is not cancelled out 
or reversed by the ratio of the geometric stability factor of ri and T2. The paper [3] 
formally proved that this kind of cancelling out would not generally take place, at least 
for valid pathways, for the following reasons. First, it is shown in [3] that the symmetry 
factor of a pathway increases with the depth of its representative tree r. More precisely, 
it was proved formally that the size of the orbit of r is bounded below by the depth of r. 
Moreover, it is known from [18] that, provided an assembly tree is valid, the geometric 
stability factor is non-zero and generally increases with the depth of the tree (and this 
correlates with biophysical observations). Therefore, if the depth of ti is greater than 
the depth of T2, then both the symmetric factor and the geometric stability factor of ti 
will generally be larger than the corresponding factors of T2- 

1.1 Contributions and Related Work 

Based on the observations in the last section, the paper ^ posed problems intended 
to isolate and clarify the influence of the symmetry factor on the probability of the 
occurrence of a given assembly pathway. Two specific problems were the following. 

(i) Enumerate the valid assembly pathways of an icosahedrally symmetric polyhedron. 
More precisely, the problem is to determine the number of such assembly pathways 
of each orbit size. 



5 



(ii) Characterize and algorithmically recognize the set of assembly trees fixed by a 
given subgroup of the icosahedral group. The characterization problem is a step 
toward the solution of the enumeration problem (i). Algorithmic recognition of 
the group elements that fix a given assembly tree (the stabilizer of the tree in the 
given group) directly determines whether the given assembly tree has a given orbit 
size. 

Remark. In this paper, we answer the above questions for general assembly trees, 
that is, we drop the condition of validity. Furthermore, in this paper, the geometric 
stability factor will be ignored, and thus "probability" will refer to tbe symmetry factor 
only. As mentioned earlier, [7^ [7^ show that validity of assembly trees of a polyhedron 
P is not only invariant under the action of its symmetry group, but can also be captured 
by simple graph-theoretic properties such as generalized notions of connectivity for the 
graph constructed from the vertices and edges of P. We expect that the techniques 
developed in this paper will help in answering the above questions in the presence of the 
validity condition as well. 

For Problem (i), we develop an enumeration method using generating functions and 
Mobius inversion. For the algorithm in Problem (ii), we provide a simple permutation 
group algorithm and an associated data structure. The results of this paper work not 
just for the icosahedral group, but also for any finite group G acting simply on a set X. 
Indeed, if G is a finite group acting on a set X, then there is a natural induced action of 
G on the set Tx of assembly trees. These are formally defined as rooted trees r whose 
non-leaf vertices have at least two children and whose leaves are bijectively labeled by 
X. If G acts simply on X, then |X| := \Xn\ = n\G\, where n is the number of G-orbits 
in X. 

Concerning Problem (i), Polya theory gives a convenient method for counting orbits 
under a permutation group action. However, because of the complexity of the cycle index 
in our situation, we were not able to apply Polya theory to Problem (i). Similarly, the 
methods used in [13] for enumerating labeled graphs under a group action (as opposed 
to rooted labeled trees), did not seem to apply. Our generating function method, on 
the other hand, finds an explicit formula (Theorem [3]) for the number of orbits of each 
possible size in the action of G on the set Tx„ of assembly trees, for every n. This leads to 
a formula for the probability of occurrence of a given assembly pathway (Corollary [S]). 
To apply these formulas it is necessary to know the number of assembly trees fixed 
by each given subgroup of G. A generating function formula for this number of fixed 
assembly trees is given in Theorem 16 of Section 5. For the proof of Theorem 16 is is 
necessary to characterize the set of such fixed assembly trees. This is done in Theorem 
9 of Section 4. 

Concerning Problem (ii), algorithms for permutation groups have been well-studied 
(see for example [H]), and algorithms for tree isomorphism and automorphism are well 
known [H |2T]. Moreover, the structure of the automorphism groups of rooted, labeled 
trees have been studied [HI [23] . However, we have not encountered an algorithm in the 
literature for deciding whether a given permutation group element fixes a given rooted, 
labeled tree; and thereby finds the stabilizer of that tree in the given group G. In Section 
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3.2 of this paper, we provide a simple and intuitive algorithm that is easy to implement, 
runs in linear time and operates in place on the input, without the use of extra scratch 
memory. 

If one is only interested in approximate and asymptotic estimates for Problem (i), 
such as in viruses with large T-numbers, a possible avenue is to use the results of 
[71 |2l] that estimate the asymptotic probabilities of logic properties on finite structures, 
especially trees. There are significant roadblocks, however, to applying these results 
to our problem. These are mentioned in the open problem section at the end of this 
paper. Finally, there is a rich literature on the enumeration of construction sequences of 
symmetric polyhedra and their underlying graphs [5l[9l[T0]. Whereas these studies focus 
on enumerating construction sequences of different polyhedra with a given number of 
facets, our goal - of counting and characterizing assembly tree orbits - is geared towards 
enumerating construction sequences of a single polyhedron for any given number of 
facets. 

2 Preliminaries on Assembly Pathways 

All groups, graphs, and label sets in this paper are assumed to be finite. A rooted tree 
is a tree with a designated vertex, called the root. We will use standard terminology 
such as adjacent, child, parent, descendent, ancestor, leaf, subtree rooted at, root of the 
subtree, and so on. For our purposes, a rooted tree is called a labeled tree if the leaves 
are bijectively labeled by the elements of a set X, and an internal (non-leaf) vertex v is 
labeled by the set of leaf-labels of the subtree rooted at v. We identify each vertex in 
a tree with its label. Let r and r' be two rooted trees labeled in the same set X. Then 
r and r' are said to be isomorphic if there is a bijection - the isomorphism - f between 
the vertices of r and r' that preserves adjacency and the root. That is, all the following 
hold. 

• {u, v) is an edge in r if and only if (/(m), f{v)) is an edge in r', 

• /(^) = "^^'j where r and r' are the roots of r and r' respectively In this case, we say 
T t' and also /(r) = r'. 

An automorphism / of r is an isomorphism of r into itself: it ensures that {u, v) is an 
edge in r if and only if {f{u), fiy)) is also an edge in r. In this case /(r) = r. 

A rooted tree for which each internal vertex has at least two children and whose 
leaves are labeled with elements of X is called an assembly tree for X. The 26 assembly 
trees with four leaves, labeled in the set X = {1, 2, 3, 4} are shown in Figure [51 

Let G be a group acting on a set X. The action of G on X induces a natural action 
of G on the power set of X and thereby on the set of vertices (vertex labels) of Tx of 
assembly trees for X. If g E G and t E Tx, then define the tree g^r) as the unique 
assembly tree whose set of vertex labels (including the labels of internal vertices) is 
{g{v) : V G r}. This tree g{T) is clearly isomorphic to r via g. This induces an action of 
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G on 73c. Each orbit of this action of G on 7^ consists of isomorphic trees and is called 

an assembly pathway for {G, X). 

Example 1 Klein 4-group acting on T^. 

Consider the Klein 4-group G = Z2 © Z2 acting on the set X = {1, 2, 3, 4}. Writing 
G as a group of permutations in cycle notation, this action is 

G = {(1)(2)(3)(4), (1 2)(3 4), (1 3)(2 4), (1 4)(2 3)}. 

For this example there are exactly 11 assembly pathways, which are indicated in Figure[5] 
by boxes around the orbits. There are four assembly pathways of size one, i.e., with 
one assembly tree in the orbit, three assembly pathways of size two, and four assembly 
pathways of size four. 

An assembly tree r is said to be fixed by an element g & G ii g is an automorphism 
of r, that is, gij) = t. See Figure [3] for an illustration. For any subgroup H of G, let 
t{H) := tx{H) denote the number of trees in Tx that are fixed by all elements of H and 
by no other elements of G. In other words, 

txiH) = \{r e Tx I stahair) = H}\. (1) 

Here stahcij) := {g & G \ gij) = t} is called the stabilizer of r in G. In other words, 
stabcir) is the set of all elements in G that fix r. It is easy to prove that stabcij) is a 
subgroup of G. 

In fact, we will see in Section that it is more natural to find tx{H), i.e., the number 
of trees in Tx that are fixed by a subgroup H of G. These may include trees that are 
fixed by larger subgroups H' such that H < H' < G. As the following theorem shows, 
the desired quantities tx{H) can then be computed from the numbers tx{H) using 
Mobius inversion on the lattice of subgroups of G. 

Theorem 2 Let G be a group acting on a set X. If H is a subgroup of G, then 

t^{H)= f^{H,K)tx{K), 

H<K<G 

where /i is the Mobius function for the lattice of subgroups of G. 

Proof: Clearly tx{H) = Y1ih<k<g^^^^)- '^^^ theorem follows from the standard 
Mobius inversion formula [22] (page 333). □ 

The index of a subgroup if in G is the number of left (equivalently, right), cosets of 
H in G, and is denoted by {G : H). By Lagrange's Theorem, this index equals |G|/|if|. 
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Figure 5: Klein 4-group acting on T4. 
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Theorem 3 The number of trees in any assembly pathway for {G,X) divides \G\. If m 
divides \G\, then the number N{m) of assembly pathways of size m is 



N{m) = 1 Yl 



m 

H<G:{G:H)=m 

Proof: It is a standard consequence of Lagrange's Theorem that, for any assembly tree 
r, the equality 

|G| = \0{t)\ ■ \stab{T)\ 

holds, where 0{t) is the orbit of r. This immediately implies the first statement of the 
theorem. 
Let 

'l if {G : stab{T)) = m _ Jl if|0(r)|=m 
otherwise I otherwise. 
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Now count, in two ways, the number of pairs {H, r) where r is an assembly tree, H < G 
is the stabilizer of r, and {G : H) = m: 

J2 t{H)=Y5{T)=mN{m). 

H<G:{G:H)=m tGTx 

Indeed, to justify the first equality, note that for a fixed subgroup H that has index m 
in G, exactly i{H) trees r will satisfy 6{t) = 1. To justify the second equality, note that 
6{r) = 1 if and only if r is one of m elements of an m-element pathway, and there are 
N{m) such pathways. □ 



Example 4 Klein 4-group acting on (continued). 

Theorem [31 applied to our previous example of Z2 © Z2 acting simply on {1,2,3,4}, 
states that the size of an assembly pathway must be 1, 2 or 4, since it must be a divisor 
of 4 = IZ2 © Z12I. To find number of pathways of each size, note that G has three 
subgroups of order 2, namely 

iri = {(l)(2)(3)(4),(12)(3 4)}, 
ir2 = {(l)(2)(3)(4),(13)(2 4)}, 
ir3 = {(l)(2)(3)(4),(14)(2 3)}, 

and that 

t{G) = 4, 

t{K,) = t{K2) = t{Ks) = 2, 
KKo) = 16, 

where Kq denotes the trivial subgroup of order 1. The assembly trees in Tx that are 
fixed by all elements of G are shown in Figure [5l A,B,C,D. For i = 1,2,3, those 
assembly trees in Tx that are fixed by all elements of Ki and by no other elements of 
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G are are shown in Figure [5l E, F, G, respectively. The remaining 16 assembly trees 
in Figure are fixed by no elements of G except the identity. Therefore, according to 
Theorem [3], the number of pathways of size 1, 2 and 4 are, respectively, 

tiG) = 4, 

^ = ^(2 + 2 + 2) = 3, 

A general formula for t{H) is the subject of Sections H] and [51 

The set Tx of assembly trees can be made the sample space of a probability space 
(Tx,p) by assuming that each assembly tree r G 7^ is equally likely, i.e., p(r) = l/\Tx\. 
Clearly, if O is an assembly pathway, then 

piO) = J^. 

The following result follows immediately from Theorem [31 

Corollary 5 If G acts on the set X and m divides \G\, then, with notation as in 
Theorem [^ there are exactly N{m) assembly pathways with probability , and no 
other values can occur as the probability of an assembly pathway. 

Example 6 Klein 4-gi"oup acting on T4 (continued). 

Again, for our example of Z2©Z2 acting simply on {1, 2, 3, 4}, application of Corollary[5l 
gives 

4 pathways with probability 

3 pathways with probability 

4 pathways with probability 

3 Algorithm for determining the stabilizer of an as- 
sembly tree in a given finite group 

The algorithm in this section takes as input a finite permutation group G acting on a 
finite set X and an assembly tree t & Tx, and finds the stabilizer stabciT). The idea 
behind the algorithm is encapsulated by the following proposition, whose proof follows 
directly from definitions given in Section 2. As defined in Section [21 the action of the 
permutation group G on X induces a natural action of G on Tx. 



1 

26' 
1 

13' 
2 

13' 
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Proposition 7 Let the finite permutation group G act on a finite set X. 

1. Let R he any set of elements of G that fix r, and let G he any set of elements of 
G that do not fix t. Then {R) , the group generated hy the elements of R, is a 
subgroup of stabciT) and |J c{R) , the union of the left cosets of (R) given hy G, 

c€C 

has an empty intersection with stabclr). 

2. An element g E G fixes r if and only if for every vertex v E t with children 
ci(v), . . . , Cfc(f ), the vertices g{ci{v)), . . . , g{ck{v)) have a common parent in r, 
and this parent is g{v). 

3.1 Input and Data Structures 

In this subsection, we give the detailed setup for our stabilizer-finding algorithm. The 
input of this algorithm is the set of elements g of the finite permutation group G acting 
on the finite set X and an assembly tree t eTx- 

We use a tree data structure, where each vertex has a child pointer to each of its 
children and a parent pointer to its parent. The root (and the tree r itself) can be 
accessed by the root pointer. Furthermore, each permutation g E G on X is input as 
a set of (yf-pointers on the leaves. That is, a leaf labeled u has a gf-pointer to the leaf 
labeled g{u). However, the labels are not explicitly stored except at the leaves. This is 
a common data structure used in permutation group algorithms |17j . 

3.2 Algorithms 

The first algorithm computes stabciT), and it uses the second and third algorithms 
for determining whether a permutation g fixes r. The correctness of the first algorithm 
follows directly from Proposition[7](l), assuming the correctness of the latter algorithms. 
The last algorithm is recursive and operates in place with no extra scratch space. For 
each vertex v, working bottom up, it efficiently checks whether the image g{v) is in r. 
The correctness follows directly from Proposition [7] (2). 

Algorithm Stabilizer 

Input: assembly-tree r E Tx] permutation group, G 

Output: generating set Rr s.t. the group (Rr) generated by Rr is exactly stabGir). 

R := {id} (currently known partial generating set of stabciT)) 
Gr := (distinct left coset representatives of (i?) 

that are currently known to not fix r) 
U := G (currently undecided elements of G) 
do until f/ = 
let g eU 

if Fixes(5f, r) 

then R := RU {g}; retain in Gr at most one 



12 



representative from any left coset of (R) 

else Cr := Cn U {g}; 

U:={U\{{R) U c{R)) 
ceCR 

fi 
od 

return Rr .= R. 
Algorithm Fixes 

Input: assembly-tree t eTx'-, permutation g acting on X, 
Output: "true" if g fixes r; "false" otherwise. 

if LocateImage((yf, r, rooi(;(r)) = root{T) 
then return true 
else return false. 

Algorithm Locatelmage 

Input: assembly-tree t ^Tx'-, permutation g acting on X; child pointer to a vertex v E r 
(root pointer if v is the root of r). 

Output: a parent pointer to the vertex g{v) G r - if it exists - such that g is the isomor- 
phism mapping the subtree of r rooted at v to the subtree of r rooted at giy); if such 
a vertex g{v) does not exist in r, returns null. 

if f is a leaf of r (null child pointer) 
then return g{v) (follow (y'-pointer) 
else 

let Ci, . . . , Cfc be the children of f ; 
if parent(LocateImage(5f, r, Ci)) = 

parent(LocateImage((7, r, C2)) = . . . 

parent(LocateImage((yf, r, Cfc)) =: w 
then return w 
else return null. 

3.3 Complexity 

Algorithm Locatelmage follows each pointer (child, g, parent) exactly once as is 
illustrated by the example shown in Figures 6-9, and does only constant time operations 
between pointer accesses. Hence it takes at most 0(|X|) time. It operates in place and 
does not require any extra scratch space. Algorithm Stabilizer, in the worst case, 
can be a brute force algorithm that simply runs through all the elements of G instead 
of maintaining efficient representations of i?, Cr and U . In this case, it takes no more 
than 0(|G||X|) time. 

However, readers familiar with Sim's method for representing permutation groups 
[17] using so-called strong generating sets and Cayley graphs may appreciate the foUow- 
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Figure 6: Locatelmage is called on the root of the assembly tree r shown on the 
left, for the permutation g shown on right. This permutation g fixes t. The data 
structure representing the tree consists of the child (blue, dashed) and parent (green, 
solid) pointers and the data structure representing g - via its action on the leaf-label set 
X - consists of (^-pointers (red, dotted). 

ing remarks. Instead of specifying the input of our problem as we have done, we may 
assume that a Cayley graph is input, which uses a strong generating set of G. With this 
input representation, the time complexity of our algorithms can be significantly further 
optimized, the level of optimization depending on properties of the group G. 

3.4 Example 

The two examples shown in Figures 6-9 illustrate the algorithms Locatelmage and 
Fixes. Figure E] for the first example shows the assembly tree r and the associated data 
structures, as well as the group element g. Figure [7] shows a run of Locatelmage applied 
to r, g at root{T). The algorithm establishes that the given permutation g fixes the given 
assembly tree r, whereby Fixes returns 'true.' The second example, in Figure [HI uses 
the same assembly tree r, but a different permutation g. The run of Locatelmage in 
Figure [9] is unsuccessful, whereby Fixes returns 'false.' 

4 Block systems and Fixed Assembly Trees 

The formulas in Section [2] for the number of orbits of each size and for the orbit sizes or 
pathway probabilities (Theorem [3] and Corollary [5]) depend on the number of assembly 
trees fixed by a group. A formula for the number of such fixed trees is the subject of 
this and the next section. 

Recall that an assembly tree r is fixed by a group G acting on X if g{T) = r for 
all g & G. Two main results of this section (Corollary [TT] and Procedure [T2i) provide a 
recursive procedure for constructing all trees in Tx that are fixed by G. This leads, in 
the next section, to a generating function for the number of such fixed trees. The results 
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^7(1) = 2) 
Locatelmage(l) 
LocateImage({l, 2}) 
Locatelmage(root) 



^7(2) = 1 

(7(1) = 2; parent((7(l) = {l,2}) 

LocateImage({l, 2}) 
Locatelmage(root) 

LocateImage(3) 

(7({1,2}) = {1,2} 

(7(2) = l;parent((7(2) = {l,2}) 
g{l) = 2- parent((7(l) = {l,2}) 
Locatelmage(root) 

(7(root)=root; Fixes returns True 

(7(4) = 3; parent((7(4) = root 

(7(3) = 4; parent((7(3) = root 

(7({1,2}) = {1,2}; parent(^({l,2}))=root 

^(2) = l;parent((7(2) = {l,2}) 

^7(1) = 2; parent((7(l) = {l,2}) 






Figure 7: A successful run of Locatelmage, where, for all vertices v in r, the image 
g{v) is established to be in the assembly-tree r shown in Figure O On the right are 
pointers traversed so far, in traversed order. On the left is the current recursion stack 
of Locatelmage calls (first call at the bottom), together with those vertices w (g X or 
C X) for which g{v) has been established to be in r, showing that g is an isomorphism 
between the two subtrees of r rooted at v and at g{v), respectively. 
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^7 = (1,4)(2,3) 



Figure 8: Locatelmage is called on the root of the same tree r as in Figure [6|, but for 
a different permutation g shown on right. In this case g does not fix r. 



g(root)=null; Fixes returns False 
parent(5'(3))7^ parent ( (7(1, 2}) 

^7(3) = 2; parent((7(3) = {l,2} 

g{{l, 2}) = {1, 2}; parent((7({l, 2}))=root 

g{2) = 3; parent((7(2))=root) 

g{l) = 4; parent (5'(l))=root) 



Figure 9: An unsuccessful run of Locatelmage. Since {1,2}, 3 and 4 are the children 
of the root, when Locatelmage(root) is called, it checks if their images under g have 
the same parent. So recursive calls are to LocateImage({l, 2}), to LocateImage(3), 
and to LocateImage(4). LocateImage({l, 2}) returns the root of r as a candidate for 
g{{l,2}). But Locatelmage (3) returns {1,2} because the parent of g{3) = 2 is {1,2}. 
Hence Locatelmage(root) returns null and Fixes returns 'false.' 
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in this section depend on a characterization (Theorem [9]) of block systems arising from 
a group acting on a set. 

For a group G acting on set X, a block is a subset 5 C X such that for each g E G, 
either g{B) = B or g{B)r\B = 0. A block system is a partition of X into blocks. A block 
system B will be said to be compatible with the group action if g{B) G B for all g & G 
and i? G B. A characterization of complete block systems (Theorem [9]) is relevant to 
the understanding of fixed assembly trees because of the following result. Let r be any 
assembly tree in Tx- For any vertex v of r, recall that v is identified with and labeled 
by its set of descendent leaf-labels. Thus the set of labels of the children of the root is 
a partition of X. 

Lemma 8 Let G act on X , and let t be an assembly tree for X that is fixed by G. If 
U is the set of children of the root of t, then U is a block system that is compatible with 
the action of G in X. 

Proof: For any v E U, let be the rooted, labeled subtree of r that consists of root v 
and all its descendents. If r is fixed by G, then g{T) = t for each g & G. In other words, 
g{Ty) = Tu for some u & U. This implies that g{v) nv = ^ifuy^voT g{v) nv = vif 
u = V. Hence U is block system that is compatible with the action of G on X. □ 

The following notation will be used in this section. The set of orbits of G acting 
on X will be denoted by O. For H < G, let Ch denote a set of (say left) coset 
representatives of H in G. Note that \Ch\ = {G : H). For r E G and Q C X, let 
r{Q) := {r{q) : q G Q}. A group G is said to act simply on X if the stabilizer of each 
X in X is the trivial group. In this paper, a partition 11 of a finite set S into k parts is 
a set {tti, 7i2, - ■ ■ , Hk} of disjoint subsets so that Ui^^Tii = S. The subsets Hi are called 
the parts of the partition 11. The order of the parts of a partition is insignificant. That 
is, {{1, 3}, {2, 4}} and {{2, 4}, {(1, 3}} are identical partitions of the set {1, 2, 3, 4}. We 
nevertheless label the parts from 1 to k for convenience. 

Theorem 9 Let us assume that G acts simply on X. Let U = {711,712, . . . ,7ik} be a 
partition of O into arbitrarily many parts, and let H = {Hi,H2, ■ . . ,Hk} be a corre- 
sponding sets of subgroups of G. For each i and each O G vTj, let Qi o be any single orbit 
of the simple action of Hi on O. Let Qi = Uoen,Qi,o ^^^^ Q = {Qi, Q2, ■ ■ ■ , Qk}- Let 
us denote by (11, H, Q) the arrangement {(vti. Hi, Qi), . . . , (vr^, Hj,, Qk)} of each VTj in 11 
with a corresponding subgroup Hi < G in H, and Qi E Q. 

1. The collection 

k 

B(n,H,Q) = |J U r{Qi), 

i=l reC-H- 

of blocks r{Qi) is a compatible block system for G acting on X. 

2. Every compatible block system for G acting on X is of the above form for some 
choice ofU, H, and Q. 
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3. Two such block systems B(n, H, Q) and B(n',H',Q') are equal if and only if, 
there is a permutation p of the set of blocks of U' so that for all i < k, we have 
T^p(i') = TTj, and for all i < k, there exists a Qi & G so that Hp(^i/^ = QiHig^^, and 
Qp(i') = 9i{Qi)- 

Proof: In order to prove Statement (1), let G H and B = r{Q), where Q = Qi for 
some i. We first show that is a block. There is a subset A C X containing at most 
one element from each G-orbit such that B = rH{A). If g{B) fl i? 7^ 0, then there are 
elements a,a' E A such that grh{a) — rh'{a') for some g E G and h, h' e H. Thus a 
and a' are in the same G-orbit, which implies that a = a'. Therefore, grh{a) = rh'{a). 
Since G acts simply, this implies that grh = rh', which in turn implies that gr and r 
are in the same coset of H in G. Therefore g{B) = gr{Q) = r{Q) = B. This proves, not 
only that i? is a block, but that B(n, H, Q) is a block system, because B(n, H, Q) is a 
partition of X into blocks. Moreover, if r{Q) e B(n, H, Q) and g E G, then by definition 
gr{Q) e B(n, H, Q), which shows that B(n, H, Q) is a block system compatible with 
G. 

In order to prove Statement (2), let us denote the set of orbits of G in its action on 
X by {Oi, O2, . . . , On}- We first show that any block B in the action of G on X is of 
the form S = Si U S2 U • • • U where Bi is a single orbit of some subgroup H < G 
acting on Oj. Let Bi — B nOi. Note that -B, = is a possibility, in which case we have 
B = SiUi?2U- ■ -Ui^m. m, < n. Each Bi itself must be a block because, if g{Bi)r\Bi ^ 0, 
then g{B) fl 5 7^ 0. However, 5 is a block, so g{B) HB ^ implies that g{B) = B, and 
thus g{Bi) = Bi. 

Let Hi^{heG\ h{Bi) = Bi}. We claim that Hi^H2^---^ Hm- To see this let 
h e Hi. Since S is a block, either h{B) n S = or h{B) = B. However, h{B) (1 B ^ $ 
is impossible because h{Bi) = Bi. Hence h{B) = B. Now Bj = B (1 Oj implies, for 
each j, that h{Bj) = Bj. Therefore hi e Hj for all i,j. This verifies the claim, so let 
H = Hi = H2 — • • • — Hjn- 

The proof that each block B is of the required form is complete if it can be shown 
that H acts transitively on Bi for each i. To see this, let x,y E Bi. Since Bi lies in a 
single G-orbit, there is a g E G such that g{x) = y. Since Bi has been shown to be a 
block and g{Bi) fl i?j 7^ 0, it must be the case that g{Bi) = Bi. Therefore g E Hi = H. 

To complete the proof of Statement (2), let B be any compatible block system for G 
acting on X. We have proved that if S e B, then B — BiU B2D • • - iJ B^, where Bi is 
a single orbit of some subgroup H < G acting on Oi. Because of the compatibility, the 
action of G on X induces an action of G on B. The orbits under this action provide a 
partition H of O, a part ir E H consisting of all G-orbits acting on X contained in the 
union of a single G-orbit acting on B. Consider any orbit of -B in this action. If B' 
is another element of W, then there is an r e G such that B' = r{B). This shows that 
the blocks in G{B) are of the desired form in Statement (1) of the theorem. Repeating 
this argument for each part in the partition H completes the proof of Statement (2). 

To prove Statement (3), we first show that if Q G Q is the union of iJ-orbits and 
H'{Q) = Q, where H, H' E H, then H' = H. Restricting attention to just one orbit 
of G in its action on X, the equahty H'{Q) = Q imphes that H'{a') = H{a) for some 
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a, a' in the same G-orbit acting on X. Let (7 G G be such that a = g{a') and hence 
Hg[a') = H'{a'), which in turn imphes that hg{a') = a' for some h & H. Because G 
acts simply, this imphes that g = G H, so H{a') = H'{a'), which again, by the 
simphcity of the action, imphes that H' = H. 

Now let us assume that B(n, H, Q) = B(n', H', Q'). Clearly, H = H'. It is sufficient 
to restrict our attention to just one of the parts in the partition 11 = 11', so we must 
show that {r{Q) : r G C^} = {r{Q') : r G C//'} if and only if H' = gHg~^, and 
Q' = 9{Qi) some g & G. If if' = gHg'^, and Q' = g{Q) for some g E G, then 
for any r G G we have r{Q') = rH'{Q') = {rgHg^^)g{Q) = rgH{Q). This shows 
that {r{Q') : r G Ch'} ^ {^(Q) • ^ Ch}, and the opposite inclusion is similarly 
shown. Conversely, assume that {r{Q) : r G Ch} = {r{Q') : r G Ch'}- Since 
Q' G {r{Q') : r G Ch'}, we know that Q' = r{Q) for some r G C// C G. Now 
{rHr~^){Q') = {rHr~^){r{Q)) = rH{Q) = r{Q) = Q' . By the uniqueness result shown 
in the preceding paragraph, we get H' = rHr'^. □ 

Example 10 Klein 4-group acting on (continued). 

Continuing the example from the previous section with G = Z2 © Z2 acting simply 
on X = {1, 2, 3, 4}, let K = Ki = {(1)(2)(3)(4), (1 2)(3 4)} and Kq the trivial subgroup. 
There are 11 blocks in the action of on X which are given below: 

{1,2, 3, 4}, {1,2}, {3, 4}, {1,3}, {2,4}, {1,4}, {2, 3}, {1}, {2}, {3}, {4}. 

The seven block systems for the action of i^' on X can be found using Theorem [91 In 
what follows, {1, 2} I {3, 4} denotes the orbit {1, 2}, {3, 4} partitioned into the two parts 
{1,2} and {3,4}, whereas {1,2}, {3, 4} denotes that same orbit partitioned the trivial 
way, into one part. Note that B ( {1,2}|{3,4}, {Kq^K} , {2}|{3,4} ), for example, is 
not included in the list below. This is because, according to Statement (3) in Theorem O 

B({1,2}|{3,4}, {K^,K}, {2}|{3,4}) = 
B({1,2}|{3,4}, {K,,K}, {1}|{3,4}). 

Namely, for g = (12)(34), we have {2} = ^({1}) and Kq = gKog-\ 

B( {1,2}, {3,4}, {K}, {1,2,3,4} = (1234) 

B({1,2},{3,4}, {i^o}, {1,3})= (13)(24) 

B ({1,2}, {3, 4}, {i^o}, {1,4})= (14)(23) 

B({{1,2}|{3,4}, {K,K}, {1,2},{3,4})= (12)(34) 

B({{1,2}|{3,4}, {K,Ko}, {1,2},{3})= (12)(3)(4) 

B({1,2}|{3,4}, {Ko,K}, {1},{3,4})= (1)(2)(34) 

B({1,2}|{3,4}, {Ko,Ko}, {1},{2})= (1)(2)(3)(4) 

Let r G 7x be a tree fixed by G in its action on Tx- If U denotes the set of children 
of the root of r, recall that Lemma [H] states that the set U of labels is a block system. 
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Recall that the label of a vertex is the set of labels of its leaf descendents, and also the 
label of a vertex is the union of the labels of its children. According to Theorem [HI any 
block system is of the form 

k 

B(n,H,Q) = |J U r(g,). 

i=l vGCh 

We will use the notation t^q to denote the subtree Tu, u E U,u = r{Q), rooted at u. 
Theorem [9] leads to the characterization of assembly trees fixed by given group G as 
stated in Corollary [TT] below. 

Corollary 11 Let us assume that G acts simply on X and that t E Tx- Let U be the 
set of children of the root of r and, for each u E U, let Tu be the rooted, labeled subtree 
of T that consists of root u and all its descendents. With notation as in Theorem\^ the 
tree t is fixed by G if and only if, for some U, H and Q, the following two conditions 
hold. 

1. U = B(n, H, Q), hence for each Q E Q, and g E G, there is a subtree tq and a 
subtree TgQ. 

2. TgQ = gijo) for every Q G Q and every g E G. 

Proof: Let us assume that r is fixed by G. Condition (1) follows immediately from 
Lemma [8] and Theorem [91 Concerning Condition (2), for any g E G, the set of leaves of 
g{TQ) is g{Q). Hence for r to be fixed by G it is necessary that girg) = TgQ. 

Conversely, let us assume that Conditions (1) and (2) hold. For any g E G we 
must show that g{T) = r. By Condition (1), it is sufficient to show that g acting on 
r permutes the set of subtrees in such a manner that gir^Q) = TgrQ for every H E H, 
Q the corresponding element of Q, and every r E Ch- However, by Condition (2), 

gi^vQ) = 9r{TQ) = TgrQ. □ 

Theorem [13] below states that the following recursive procedure constructs any as- 
sembly tree t E Tx fixed by G. This will be used to prove Theorem [16] in the next 
section. 

Procedure 12 Recursive construction of any assembly tree fixed by a group G: 

(1) Partition the set O of G-orbits of X: H = {tti, 7r2, . . . , VTfc}. Note that the parts of 
H are labeled 1, 2, ■ ■ ■ ,k in some arbitrary way. 

(2) For each i = 1,2, . . . , k, choose a subgroup < G. (If H has only one part then 
Hi = G is not allowed.) 

(3) For each i, choose a single orbit of Hi acting on each of the G-orbits in vTj, and let 
Qi be the union of these ifj-orbits. 
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(4) Recursively, let tq- be any rooted tree whose leaves are labeled by Qi and which 
is fixed by Hi. 

(5) Let Si = { r{TQ-) \ r G Cui) and S = UjL;^ Si. Let r be the rooted tree whose 
children are roots of the trees in S. 

Theorem 13 The set of assembly trees constructed by Procedure UM is the set of assem- 
bly trees fixed by the group G. 

Proof: In the notation of Theorem [9l Steps (1), (2), and (3) are choosing (n,H, Q). 
Steps (4), (5), and (6) are ensuring that U = B(n, H, Q). Note that the restriction in 
Step (2) is because otherwise the root of the resulting tree in Step (6) would have only 
one child. Note also that in Step (5), Si does not depend on the particular set of coset 
representatives. This follows directly from Step (4). 

It is now sufficient to show the following. For any assembly tree r satisfying U = 
B(n, H, Q) for some (H, H, Q), Condition (2) in Corollary [TT] holds if and only if r is 
constructed by Procedure [121 To show that any assembly tree r constructed by Proce- 
dure [12] satisfies Condition (2), note that Step (4) implies that, if Q G Q corresponds 
to H E a, then H{Q) = Q and hence h{TQ) = Thq = tq for all h E H. For g E G, if 
g = rh, where h e H, then g{TQ) = rh{TQ) = rirhq) = rirq) = t^q, the last equality 
from Step (5). Again, because H{Q) = Q, we have girq) = t^q = TrhQ = TgQ. 

Conversely, if r satisfies Condition (2) in Corollary [Til then consider the trees tq. i = 
1,2, ... ,k. These are trees whose leaves are labeled by Qi. in Step (4) of Procedure [T2l 
Moreover, by Condition (2) we have h{TQ-) = t^q. = tq- for all h G Hi, so tq- is fixed by 
Hi. By Step (5) of Procedure [T2l and Condition (2) of Corollary [TTI we have r(rQ.) = TrQ- 
for all r G C//- . Therefore the tree r is constructed by Procedure [121 □ 

Remark 14 Enforcing uniqueness in the construction. 

The construction in Procedure [121 is not unique, in that it may produce the same fixed 
assembly tree multiple times depending on the choices in Steps 2 and 3. Condition 
(3) in Theorem [91 shows that we may enforce uniqueness if we make the following two 
restrictions. 

(a) If we choose (11, H, Q) = {{tti, Hi,Qi), . . . , {nk, Hk,Qk)} in Steps 1 and 2 of 
the procedure while constructing a tree r, and if we also have (11, H', Q') = 
{{tti, H[,Q[), . . . , {Tik, H'i^,Q'i^)} during the construction of another tree r', then 
to ensure that r 7^ r' we need to ensure that for at least one i, the group H^ 
should not be conjugate to Hi in G. 

(b) Consider the construction of two trees r and r' with corresponding (11, H, Q) and 
(n, H', Q') such that for each i, the subgroup Hi is a conjugate of the subgroup 
H'i. Further assume that in Step (3) for the tree r the element gi G Hi is such that 
giHigl^ = H'i. Then while constructing tree r', we need to ensure that there is at 
least one i such that Q' 7^ gi{Qi)- (Note that for a given index i, there may well 
be several elements gi E G so that giHig^^ = H- holds, and all those are subject 
to this restriction.) 
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Example 15 Klein 4-group acting on (continued). 

With G = Z2 © Z2 acting on X = {1, 2, 3, 4}, consider the assembly trees r fixed by 
the subgroup K = {(1)(2)(3)(4), (12)(34)}. There are exactly six such trees, those in 
the orbits A, B, C, D, E of Figured These correspond (not in corresponding order) to 
the block systems in Example [TOl Because of the restriction in Step (2) of Procedure [T2l 
the first block system in the list in Example [10] is ignored. 

4.1 When the action of G on X is not simple 

Let us assume that G acts on X, but not necessarily simply. For q E X, let Sg denote 
the stabilizer of q in G. For a subset Q C X, let 

SQ=[j S,. 

If G acts simply on X, then the stabilizer of any x E X is the trivial subgroup. Therefore, 
in this case, it is clear that Sq G H for any Q C X and H < G. In the general case, 
when G acts not necessarily simply on X, let us call a pair {H, Q) viable if 

SqCH. 

If only viable pairs {Hi,Qi) are allowed in the hypothesis of Theorem [9], then the the- 
orem is valid in the general, not necessarily simple, case. Since this general version of 
Theorem [9] and associated analogs of Procedure [12] and Theorem [13] are not needed in 
subsequent sections, and the proofs are relatively straightforward extensions, we omit 
them. 



5 Enumerating Fixed Assembly Trees 

Let us assume in this section that G acts simply on each of an infinite sequence 
Xi,X2,... of sets where, by formula f[]~j) we have |X„| = n\G\. In other words, n 
is the number of orbits of G in its action on X„. Denote by tn{G) the number of trees in 
Tn := Tx„ that are fixed by G. In this section we provide a formula for the exponential 
generating function 

faix) :=5^t„(G)^ 

n>l 

for the sequence If G is the trivial group of order one, then let us denote this 

generating function simply by f{x). This is the generating function for the total number 
of rooted, labeled trees with n leaves in which every non-leaf vertex has at least two 
children. For if < G, let 
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Theorem 16 The generating function fcix) satisfies the following functional equations: 

1 -x + 2f{x) =exp(/(a;)), 

and for \G\ > 1, 



l + 2/G(x)=exp M^)]- 

\H<G J 



Proof: The first formula is proved in [20!, P^S^ 13. For \G\ > 1, we use the standard 
exponential and the product formulas for generating functions. 

The proof of the second formula uses two well known results from the theory of expo- 
nential generating functions, the "product formula" and the "exponential formula". In 
Procedure fT2l give Steps (3) and (4) the name putting an Hi-structure on TCi. According 
to Theorem [131 the number of trees tn{G) fixed by G equals the number of ways to 
partition the set of orbits of G acting on X„ and to place an if-structure on each part 
in the partition, for some subgroup H < G, keeping the uniqueness Remark in mind. 

In Step (3) of Procedure [12], since G acts simply and the number of ifj-orbits in 
one G-orbit is = {G : Hi), the number of possible choices for Qi (the union of 

these single ifj-orbits) is (G : H)"^. Hence, in accordance with Step (4) of Procedure [T^ 
the generating function for the number of ways to place an iJ-structure is basically 
fniiG : H)x). 

However, this must be altered in accordance with the uniqueness requirements in 
Remark [T31 Let N denote a set consisting of one representative of each conjugacy 
class in the set of subgroups of G. By Statement (a) in Remark [TU only subgroups 
in N are considered. Let N{H) := {g E G \ gHg~^ = H} denote the normalizer of 
H in G. By Statement (b), there has to be an index i so that g{Qi) 7^ Q'i- However, 
9{Qi) = d'iQi) 'will occur for every i if and only if g and g' are in the same coset of H in 
G. Therefore, the generating function for the number of ways to place an if-structure 
is (N{Hy.H) fnHG : H)x). 

The exponential formula states that the generating function gnix) = ^„>q ctn ^ for 
the number of ways a„ to partition the set of G-orbits acting on and, on each part 
TT in the partition, place an if-structure (same H) is 

gnix) := exp (Jh{x)^ ■ 

Here we assume that Oq = 1. 

The generating function for the number of ways to partition the set of orbits, i.e., 
choose H = (tti, tt2, . . . , vr^) and, on each part of the partition, place an iZ-structure, one 
H from each conjugacy class in N is 

n 9h{x) = n exp ( '.^. MiG : H)x)] 



fniiG : H)x) 



23 



Note that we have not taken the restriction in Step (2) of Procedure [12] into considera- 
tion. Taking the partition of the orbit set into just one part and placing on that part a 
G-structure results in counting the number of fixed trees a second time. Also since the 
constant term in YlneN 9h{x) is 1, 

fniiG : H)x) 




H<G 



{G:H) 



fH{{G : H)x) 



Here the last equality holds because depends only the conjugacy class of if in G 

and 

r /^^^ = , ^ = (G : NiH)) = INI. 

{N{H):H)/{G:H) {N{H) : H) ^ V I I 



Example 17 Klein 4-group acting on (continued). 

Consider G = Z2 © Z2 acting on X„. Recall that X„ = 4n, the integer n being the 
number of G-orbits. In this case N = {-ft'o) -^i? -^2, K^, G}, where Kq is the trivial group 
and 

Ki = {(l)(2)(3)(4),(12)(3 4)}, 
ir2 = {(l)(2)(3)(4),(13)(2 4)}, 
K3 = {(1)(2)(3)(4),(14)(2 3)}. 

The functional equations in the Statement of Theorem [12] are 
l-x + 2f{x) = exp (/(a;)) 

1 + 2fKXx) = exp Q f{2x) + j for ^ = 1, 2, 3, and 

1 + 2/g(x) = exp Q /(4a;) + ^ fK,{2x) + ^ /^,(2x) + ^ /;,3(2x) + 

Using these equations and MAPLE software, the coefficients of the respective generating 
functions provide the following first few values for the number of fixed assembly trees. 
For the first entry ti(G) = 4 for the group G, the four fixed trees are shown in Figure [5] 
y4, B, G, D. For trees with eight leaves there are t2(G) = 104 assembly trees fixed by 
G = Z2 © Z2, and so on. 



tn{Ko) 
tn{Ki) 
tn{G) 



1,1,4, 26,236,2752 

1,6,72,1312,32128,989696 

4, 104, 4896, 341120, 31945728, 3790876672. 
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Theorem [T6l provides the generating function for the numbers tn{H) of fixed assembly 
trees in the action of any subgroup H < G on X„. What is required for Problem (i) de- 
scribed in Sections 1 and 2 are the numbers tn{H) of assembly trees that are fixed by H, 
but by no other elements of G. In Example [T5l for G = ^20^2 acting on X = {1, 2, 3, 4}, 
there are six trees that are fixed by the subgroup K = { (1)(2)(3)(4), (1 2)(3 4) }. How- 
ever, of these six, four {A, B, C, and D in Figure [5]) are also fixed by G. Therefore 
there are only two assembly trees fixed by K and no other elements of G (these are E, 
and F in Figure [5]). In general, as shown Theorem [21 Mobius inversion [22| can be used 
to calculate the values of tx{H) from the values of tx{H). 

6 The Icosahedral Group 

For completeness, the results of the previous sections are applied to the motivating T = 1 
viral example. An isometry of 3-space is a bijective transformation that preserves length, 
and an isometry is called direct if it is orientation preserving. Rotations, for example, 
are direct, while reflections are not. A symmetry of a polyhedron is an isometry that 
keeps the polyhedron, as a whole, fixed, and a direct symmetry is similarly defined. The 
icosahedral group is the group of direct symmetries of the icosahedron. It is a group of 
order 60 denoted Gqq. 

As mentioned earlier, the viral capsid is modeled by a polyhedron P with icosahedral 
symmetry, whose set X of facets represent the protein monomers. The icosahedral group, 
acts on P and hence on the set X. It follows from the quasi-equivalence theory of the 
capsid structure that Ggo ^-cts simply on X. Formula flTl) shows that |X| := = 60n, 
where n is the number of orbits. Not every n is possible for a viral capsid; n must be 
a T-number as defined in the introduction. Before the number of orbits of each size 
for the action of Geo on the set 7^ := Tx„ of assembly trees can be determined, basic 
information about the icosahedral group is needed. 

The group Gqo consists of: 

• the identity, 

• 15 rotations of order 2 about axes that pass through the midpoints of pairs of 
diametrically opposite edges of P, 

• 20 rotations of order 3 about axes that pass through the centers of diametrically 
opposite triangular faces, and 

• 24 rotations of order 5 about axes that pass through diametrically opposite ver- 
tices. 

There are 59 subgroups of Ggo that play a crucial role in the theory. Besides the two 
trivial subgroups, they are the following: 

• 15 subgroups of order 2, each generated by one of the rotations of order 2, 
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• 10 subgroups of order 3, each generated by one of the rotations of order 3, 

• 5 subgroups of order 4, each generated by rotations of order 2 about perpendicular 

• 6 subgroups of order 5, each generated by one of the rotations of order 5, 

• 10 subgroups of order 6, each generated by a rotation of order 3 about an axis L 
and a rotation of order 2 that reverses L, 

• 6 subgroups of order 10, each generated by a rotation of order 5 about an axis L 
and a rotation of order 2 that reverses L, 

• 5 subgroups of order 12, each the symmetry group of a regular tetrahedron in- 
scribed in P. 

From the above geometric description of the subgroups, it follows that all subgroups of a 
given order are conjugate in the group Geo- Representatives of the conjugacy classes of 
the subgroups of the icosahedral group are denoted by Gqi G^2, G3, G5, Ge, Gio, Gu, Gqq, 
where the subscript is the order of the group. The set of subgroups of Geo forms a 
lattice, ordered by inclusion. A partial Hasse diagram for this lattice L is shown in 
Figure [THl The number on the edge joining Gj (below) and Gj (above) indicate the 
number of distinct subgroups of order i contained in each subgroup of order j. The 
number in parentheses on the edge joining Gj (below) and Gj (above) indicate the 
number of distinct subgroups of order j containing each subgroup of order i. It is well- 
known that any finite partially ordered set P admits a Mobius function /i : P x P ^ Z. 
The Mobius function of L is shown in Table 1. The entry in the table corresponding to 
the row labeled Gj and column Gj is /i(Gj, Gj). 

For |X| = 60, i.e., for the T = 1 polyhedral case, using Theorem [16] and MAPLE 
software, the generating functions fcXx) were computed, and hence their coefficients 
teo/j(Gj) which count the number of assembly trees that are fixed by Gj were also be 
computed. Note that since \X\ = 60, the number of orbits of Gj in its action on X is 
60 /i. Substituting these values into Theorem [2] and using the Mobius Table 1 yields 
the following numerical values for tj(Geo/i), the number of assembly trees over X with 
|X| = 60 that are fixed by Gj but by no other elements of Geo- In other words, these 
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Figure 10: Partial Hasse diagram for the lattice of subgroups of the icosahedral group. 



are the numbers of trees whose stabilizer in Ggo is Gi. 



^30(^2) 
^20(^3) 
^15(^4) 

tl2(G5) 

^&{Gio) 
UG12) 
^1(^60) 



: 1924465510132437394720184730922187571120346754532 
2366329965115755432139023628289410324670840066578537680 
: 1670856367100496379411587456529324583988755126499875584 
: 10087157294451731428720995944759704 
: 10041342673530270014535171213312 
: 20540071766413107840 
: 61346927354448105268 
223503950260 
: 16865654580 
: 204 



From Theorem [3l the above numbers ^4(6*60/4) tell us the number of assembly trees 
with orbit size or in other words, trees in an assembly pathway of size i. That is, the 
probability of such a pathway is i/\Tx\. 

It is worth comparing the first and last elements of this list. While the individual 
pathways belonging to Gi are only 60 times more probable then those that belong to 
Geo 5 there are about 10^^ times more of them. 
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Gj G2 G3 G4 G5 Gio G12 Geo 





1 


-1 


-1 


2 


-1 


3 


5 


4 


60 


G2 





1 





-1 





-1 


-1 








G3 








1 








-1 





-1 


2 


G4 











1 











-1 





G5 














1 





-1 








Ge 

















1 








-1 


Gio 




















1 





-1 


G12 























1 


-1 


Geo 


























1 



Table 1: The values of the Mobius function of the subgroup lattice of Geo- 

7 Conclusion and Open Problems 

We have developed an algorithmic and combinatorial approach to a problem arising in 
the modeling of viral assembly. Our results illustrate, not only that problems arising 
from structural biology can be of independent mathematical interest, but also that 
mathematical methods have a direct application in structural biology. 

More specifically, we have developed techniques to analyze the probability of a capsid 
forming along a given assembly pathway. One remaining issue is how to extend these 
techniques to finding the probability of valid assembly pathways as defined in Section 
[TJ As mentioned earlier, valid assembly trees can be defined combinatorially, using 
generalized notions of connectivity of the polyhedral graph whose facets form the leaves 
of the tree. Combining such graph theoretic restrictions with our techniques will likely 
require new ingredients. A second important issue is how to extend our techniques to 
nucleation in viral shell assembly. Mathematically [3], the problem is to estimate the 
proportion of valid assembly trees that have a subtree whose leaves form a specific subset 
of facets, for example a trimer or a pentamer, in the underlying polyhedron. 

In addition to the above extensions of the theory, there is scope to tighten some 
results of the paper. For example, a finer complexity analysis for Algorithm Stabilizer 
could be based on using Sim's algorithm, strong generating sets, and the Cayley graph 
for G as input. 

A study of unlabeled trees that are (7-unfixable may lead to relevant related results. 
Let us say that a tree is g-unfixable if there is no leaf-labeling so that the resulting 
labeled tree is fixed by the permutation g, and let us say that a tree is G-unfixable 
if it is (yf-unfixable for every nontrivial element of the group G. These properties are 
interesting for at least two reasons. First, they clarify the minimum quantifiable infor- 
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mation in a labeled tree that is needed for deciding if it is fixed by a group element 
g: if the underlying unlabeled tree is (yf-unfixable, then the information in the labeling 
is unnecessary to make this decision. This may lead to efficient algorithms and tight 
complexity bounds. Second, in the language of formal logic, these properties are likely 
to be monadic second order expressible [71 [23], permitting the application of limit laws 
for the asymptotic probabilities of finite structures satisfying such properties. 



References 

M. Agbandje-McKenna, A.L. Llamas-Saiz, F. Wang, P. Tattersall and MG Ross- 
mann. Functional implications of the structure of the murine parvovirus, minute 
virus of mice. Structure, 6:1369-1381, 1998. 

B. Berger and P.W. Shor. Local rules switching mechanism for viral shell geometry. 
Technical report, MIT-LCS-TM-527, 1995. 

M. Bona and M. Sitharam Infiuence of symmetry on probabilities of icosahedral 
viral assembly pathways. Computational and Mathematical Methods in Medicine: 
Special issue on Mathematical Virology, Stockley and Twarock Eds, 2008. 

B. Berger, P. Shor, J. King, D. Muir, R. Schwartz and L. Tucker-Kellogg. Local 
rule-based theory of virus shell assembly, Proc. Natl. Acad. Sci. USA, 91:7732-7736, 
1994. 

Gunnar Brinkmann and Andreas Dress. A constructive enumeration of fullerenes. 
Journal of Algorithms., 23:345-358, 1997. 

D. Caspar and A. Klug. Physical principles in the construction of regular viruses. 
Cold Spring Harbor Symp Quant Biol, 27:1-24, 1962. 

K.J. Compton. A logical approach to asymptotic combinatorics II: monadic second- 
order properties, J. Comb. Theory Ser. A 50(1):110-131, 1989. 

W.H.E. Day. Optimal algorithms for comparing trees with labeled leaves. Journal 
of Classification, 2(l):7-26, 1985. 

M. Deza and M. Dutour. Zigzag structures of simple two-faced polyhedra, Combin. 
Probab. Comput, 14(l-2):31-57, 2005. 

M. Deza, M. Dutour, and P. W. Fowler. Zigzags, railroads, and knots in fullerenes, 
Chem. Inf. Comp. Sci., 44:1282-1293, 2004. 

P. Gawron, V. V. Nekrashevich, and V. I. Sushchanskii, Conjugacy classes of the 
automorphism group of a tree Mathematical Notes 65(6):787-790, 1999. 

J. E. Johnson and J. A. Speir. Quasi-equivalent viruses: a paradigm for protein 
assemblies, J. Mol. Biol, 269:665-675, 1997. 



29 



[13] M.H. Klin. On the number of graphs for which a given permutation group is the 
automorphism group (Russian), English translation: Kibernetika 5:892-870, 1973. 

[14] C. J. Marzec and L. A. Day. Pattern formation in icosahedral virus capsids: the 
papova viruses and nudaureha capensis /3 virus, Biophys, 65:2559-2577, 1993. 

[15] D. Rapaport, J. Johnson and J. Skolnick. Supramolecular self-assembly: molecular 
dynamics modeling of polyhedral shell formation, Comp Physics Comm, 1998. 

[16] V. S .Reddy, H. A. Giesing, R. T. Morton, A. Kumar, C.B. Post, C. L. Brooks, and 
J. E. Johnson. Energetics of quasiequivalence: computational analysis of protein- 
protein interactions in icosahedral viruses. Biophys, 74:546-558, 1998. 

[17] A. Seress. Permutation Group Algorithms, Cambridge University Press, 2003. 

[18] M. Sitharam and M. Agbandje-McKenna. Modeling virus assembly using geometric 
constraints and tensegrity:avoiding dynamics. Journal of Computational Biology, 
13(6):1232-1265, 2006. 

[19] M. Sitharam and M. Bona. Combinatorial enumeration of macromolecular assembly 
pathways. In Proceedings of the International Conferecnce on hioinformatics and 
applications. World Scientific, 2004. 

[20] R. Stanley. Enumerative Combinatorics, Volume 2, Cambridge University Press, 
1999. 

[21] G. Vahente. Algorithms on Trees and Graphs, Springer, 2002. 

[22] J.H. van Lint and R.M. Wilson. A Course in Combinatorics, Cambridge University 
Press, 2006. 

[23] S. G. Wagner. On an identity for the cycle indices of rooted tree automorphism 
groups Electronic Journal of Combinatorics, 13:450-456, 2006. 

[24] A. R. Woods, Coloring rules for finite trees and probabilities of monadic second 
order sentences. Random Structures and Algorithms, 10(4):453-485, 1998. 

[25] A. Zlotnick, R Aldrich, J. M. Johnson, P. Ceres, and M. J. Young. Mechanisms of 
capsid assembly for an icosahedral plant virus, Virology, 277:450-456, 2000. 

[26] A Zlotnick. To build a virus capsid: an equilibrium model of the self assembly of 
polyhedral protein complexes. J. Mol. Biol., 241:59-67, 1994. 

[27] A. Zlotnick, J. M. Johnson, P.W. Wingfield, S.J. Stahl, and D. Endres. A theo- 
retical model successfully identifies features of hepatitis b virus capsid assembly. 
Biochemistry, 38:14644-14652, 1999. 



30 



